# Multimodal image-text understanding
Qwen2.5 VL 3B Instruct GGUF
Qwen2.5-VL-3B-Instruct is a 3B-parameter multimodal model supporting image-text generation tasks, specifically optimized for vision capabilities in llama.cpp.
Text-to-Image English
Q
Mungert
10.44k
8
Gme Qwen2 VL 2B Instruct GGUF
This is a quantized version of a multimodal model that supports both English and Chinese, suitable for image-text to text tasks.
Image-to-Text Supports Multiple Languages
G
sinequa
350
0
Featured Recommended AI Models